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There has been some controversy over whether filelDs provided by Pilot should be unique 
over all Xerox systems. The original Pilot Concepts and Facilities [1] specified that a FilelD 
would contain 64 bits and would be "unique over all time". The first draft of the Pilot 
Functional Specification [2] merely said that a FilelD v/ould be "unique over the life of the 
system". During the file working group meetings that followed the first draft, however, it 
became apparent that it might be valuable to have the FilelD unique over all time and space. 
We only have the power to make this cover all Xerox OlS systems, however. This seemed 
particularly appealing since it did not appear necessary to increase the size of the FilelD 
significantly in order to achieve this added generality. Accordingly plans began to be made 
to use a 64 bit FilelD that would be unique over all Xerox OlS systems that use Pilot [3]. 
Some people have had reservations about the need to go to the effort that is needed to 
ensure that the FilelDs be unique over all Xerox systems [4]. Others have felt strongly 
enough about the issue, however, to produce a document that sketches some of the issues 
involved [5]. The issue is currently unresolved as can be seen from the current draft of the 
Pilot Functional Specification [6] which does not specify what the range of uniqueness is, 
but assumes that if a file moves to another system element it may be possible for software 
to automatically discover its new location. This note proposes mechanisms for making 
FilelDs unique over all Xerox systems without using more than 64 bits for the FilelD, 

Why Span All Xerox Systems? 

The justification behind the use of FilelDs that are unique over all Xerox systems is that it 
satisfies a need that would otherwise exist to translate between the separate ID spaces on 
different Xerox systems. When two computer systems are tightly coupled, it is obvious that 
they should share a single filelD space (this might be implemented merely by prefixing a 
system identifier to the filelDs normally used on each system). By providing a single ID 
space for all systems, however, v/e provide three features: 1) systems can be dynamically 
reconfigured easily, 2) disk packs can be moved from system to system easily, and 3) the size 
of the filelD is independent of the number of systems that are tightly coupled together. The 
cost of providing filelDs that are unique over all Xerox systems is in two parts: a) the 
FilelD must be somewhat larger than if each system had its own filelD space (this cost is 
surprisingly small, however), and b) allocation of filelDs must be done more carefully to 
prevent one system from allocating filelDs that are being allocated by another system that 
may reside in a foriegn country. The reason the storage cost for a general filelD is so small 
is because the filelD for a single system must be rather large and the number of files that 
can be handled by a single ID space increases exponentially with the size of the filelD. Since 
the total amount of storage used by filelDs is reasonably small, the cost of the larger filelD 
is not severe even if the filelD must be as much as a factor of two larger. In fact, the 
increase in the size of the filelD in order to span all Xerox systems is probably less than 
30%. 
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General Approach 

Whether the filelD spans only a single system or all Xerox systems, it is necessary to choose 
a size for the filelD that is large enough to serve the range of systems that Xerox would like 
to provide. Actually the range of the performance of the systems Xerox would like to 
provide is rather broad and the rate of generation of filelDs is very hard to predict Rather 
than trying to derive the appropriate size of a filelD, I will instead propose a size that 1 
consider to be adequate and then defend the adequacy of this size. The basic technique I will 
use is to determine an upper bound on the rate of allocation of filelDs and provide enough 
filelDs to last for the lifetime of the product. Thus I do not pretend to have found a least 
upper bound on the number of filelDs that are needed, I merely suggest that there is a high 
probability that the size proposed is larger than the true least upper bound on the number of 
filelDs that will be allocated during the lifetime of the OIS product line. 

How Large is 2"^? 

The size of unique filelD that I would like to defend is 64 bits. The basic reason that I feel 

that this size is adequate is because when 2^^ is divided by 10", the resulting number is the 
number of milliseconds in 500 years. This means that using a 64 bit filelD, we could support 
one million simultaneously and continuously operating processors for 500 years, assuming 
that the long-term average rate of file creation on a single processor is one per millisecond, 
or 3.6 million files per hour. One million systems would provide about 2% of the entire 
work-force in the United States with a personal computer. 

The Size of the Excessive Safety Factor 

The above numbers are all very conservative worst-case estimates. In fact, they are so 
absurdly conservative that we can estimate an excessive safety factor that could be removed 
from the above estimates while still providing an adequate safety factor in the use of 
filelDs. The worst-case estimate of total product lifetime of 500 years is probably too large 
by a factor of 10 (we should expect to be using a different operating system in at least 50 
years, at which point the number of running systems will decrease as we switch to the new 
system). Similarly, the worst-case long-term average of file creation is probably too high by 
a factor of 100 even assuming that systems do actually operate continuously. Finally, the 
assumption of continuous operation is ridiculous when considering personal computers. 
Even time-shared systems do not actually run more than about 1/4 of the time, thus this 

estimate is too large by a factor of 10. Thus we find that it is almost certain that 2^^ is 

more than 10"* times the total number of files that will be created on all OIS products (that 
run Pilot) combined. 

The Problem of Allocating FilelDs 

Some of this excessive safety factor must be used to ensure that no two OIS systems allocate 
the same filelD. There is a fundamental requirement that there be a single, central place in 
the world that is the highest authority for allocating filelDs. This is obviously unacceptable 
for the allocation of individual filelDs, however. It is possible, however, for the central 
allocator to delegate allocation authority over portions of the ID space to less central 
allocators. Each allocator thus should have a range of filelDs that it may allocate. One 
allocator may allocate a subrange of filelDs to another allocator, but only if the original 
allocator then behaves as if all of those filelDs have been allocated to actual files. 

The central allocation problem is solved by finding a system of delegating allocation 
authority to progressively less centralized allocators. A system of allocators consisting of 
three levels seems to be adequate. The Xerox factory is, of course, the highest level. Actually, 
there may be several Xerox factories, thus requiring that some care be given to the allocation 
problem at this level as well. The second level of allocator should be associated with the 
individual processor. It can be argued that there is no need for any allocators below the level 
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of the individual processor, but it would be nice for a filelD to give a hint about what disk 
volume it resides on. Such a hint can be obtained if each disk volume contains the lowest 
level allocator of filelDs. Thus the allocator in each system allocates a series of filelDs to a 
disk volume. 

Crash-Proof Allocators 

A basic problem with a multi-level filelD allocator arises if it is possible for an allocator to 
be destroyed. When an allocator is destroyed, we must assume that all of the filelDs it was 
responsible for were allocated. Thus the higher the level of the allocator, the more serious is 
the destruction of the allocator because the destruction of the allocator effectively allocates 
more filelDs. In addition, it is particularly painful if the allocator that resides in a processor 
is destroyed because the operation of allocating filelDs from a Xerox factory to the 
processor is a very slow, expensive operation. Thus it is necessary for the allocator in the 
processor to be able to survive system crashes of all kinds. This requires the allocator to be 
stored in non-volatile memory. Current technology would allow a small portion of the 
system hardware to be devoted to providing a special counter in EAROM (electrically 
alterable read-only memory). Current technology would only allow such a counter to be 

incremented a total of 10^ times, but this limitation can be programmed around. If we have 
designed the allocation system so that high level allocators are not lost, then the allocator 
that is at the factory should be designed very carefully as well. 

The System of Allocators 

Since we would like to handle a maximum of 10" simultaneously operating processes, it 

seems prudent to allocate only 2^^ filelDs to each processor. This allows us to allocate 

filelDs from the factory a total of 2^^ (or 16 x 10") times before we have exhausted our ID 

space. We can only allocate as many as 2^" filelDs to a processor if we are reasonably sure 
that the allocator in a single processor will not be destroyed. 

The next question is how many filelDs should be allocated to a disk volume at one time. If 
we are too generous, there is an opportunity here for filelDs to be allocated to allocators 
that will never allocate the filelDs to actual files. We must ensure that at least 1% of filelDs 
allocated to leaf-node allocators (i.e. disk volumes) actually be assigned to files. Since a 
processor is tightly coupled to its disk, it would be possible for the processor to allocate 
arbitrarily small blocks of filelDs to disk volumes. If the processor allocates too few filelDs 
to a disk volume at once, however, then the f ilelD will not provide any useful hint about the 
location of the file. These two factors seem to be well balanced if we allocate filelDs to disk 

volumes in blocks of 2^" filelDs. 1% of these filelDs will be assigned to actual files if 600 

files are created on the volume or if, on the average, 1% of disk volumes allocate all 2^^ 
filelDs. 

This allows a processor to allocate blocks of 2^" filelDs to disk volumes a total of 2^^ times 

before the total set of 2^^ filelDs has been exhausted. A processor could initialize disk 
volumes at the rate of one per second continuously for 6 months before exhausting its 
allocation. If the long term average rate of initializing disk volumes is as low as one per 

minute, however, then the processor's allocation of 2^" filelDs would last for 30 years. 

What happens if, for some strange reason, a processor does in fact use up its entire 
allocation of filelDs? It will probably be necessary to replace the EAROM chip at this point 
because its counter will have been modified as many times as possible. At the same time as 

the EAROM chip is being replaced, a new series of 2^" filelDs must be allocated to the 
processor. Both of these operations must be performed by Xerox maintenance personnel. It 
is the desire to reduce the need for this maintenance operation, especially on high 

performance systems, that causes us to allocate as many as 2^^ filelDs to each processor. It 
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seems prudent to replace the EAROM chip when 80% of the filelDs have been allocated. 
The resulting percentage wastage of filelDs is very small. 

Representations for Allocators 

Although this analysis seems reasonable, it is quite clear that there is no very good basis for 

choosing 2^^ filelDs as the number that should be allocated to a processor. The choice of 

2^" filelDs per disk is also a pure guess. These guesses are good enough to start from, but we 
should maintain the ability to change the number of filelDs that are allocated at once to a 
processor and to a disk volume. It may even become necessary to allocate different numbers 
of filelDs to different processors or different disk volumes. If we keep track of the date on 
which an allocation of filelDs was made to a processor or a disk volume, then the number 
of filelDs that should be allocated when this initial allocation is exhausted can be 
automatically calculated from the observed average rate of usage of filelDs combined with 
the length of time desired before another allocation of filelDs to this processor or disk 
volume is needed. 

The allocator that resides on a disk volume should thus contain four numbers: a) the first 
filelD in the sequence allocated to the disk, b) the maximum filelD in the sequence allocated 
to the disk, c) the next free filelD, and d) the date at which the sequence of filelDs was 
allocated to the disk volume. Once this information is maintained, we have a great deal of 
flexibility to construct allocators that will waste very few filelDs. Newer, better allocators 
could be put into a system merely by changing a few small programs, i.e. the allocators 
themselves. 

The allocator contained in the processor has a different set of design constraints and so 
should use a slightly different representation for the allocator. First, since the EAROM can 

only be incremented a total of 10^ times, a group of filelDs must be allocated at once. The 
number of filelDs to be allocated at once should be contained in the EAROM. Second, bits 
in the EAROM will probably be at a premium. The allocator in EAROM must contain at 
least three numbers: a) the next free filelD, b) the last free filelD, and c) the number of 
filelDs to allocate at once. We can save storage for the low order 16 bits of all three 

numbers by forcing the minimum allocation of filelDs from the processor to be 2^" and 

then aligning the allocations on 2^^ boundaries. If the maximum number of filelDs that a 

processor allocates at once is 2^"*, then the number of filelDs allocated at once can be 
represented with 8 bits. If the maximum number of filelDs allocated to a processor from the 

factory is set at 2"*^ and the minimum at 2^^, and if the next free filelD is represented by 

48 bits (a 64 bit number on mod 2^^ boundary), then the last free filelD can be represented 

with 16 bits (a 64 bit number on mod 2^^ boundary with the high order 24 bits derived 
from the next free filelD). This allocator is exhausted if the next allocation would result in 
the next free filelD having bits 24-40 equal to the 16 bits used to represent the last free 
filelD. The other two numbers that a disk allocator maintains (the first filelD in the 
original sequence and the date the sequence was allocated to the disk) need not be kept in 
the EAROM. The maintenance personnel should make an attempt to keep these numbers, 
but the system should also try to keep these numbers on the system disk. If a crash wipes out 
these numbers on disk, it would be acceptable to replace them with the next free filelD and 
the date of the crash. These two numbers still allow a rate of filelD usage to be calculated 
even if they only give the rate of usage since the last major crash. 

System software has the option of allocating filelDs to disks in units smaller than or larger 
than the number allocated at once by the hardware allocator, but a system crash may destroy 
the software allocator. If the allocator maintained by system software is destroyed, it must 
be re-initialized from the hardware allocator in the processor. This initialization should be 
done only when the system software actually needs to allocate filelDs rather than during an 
initialization phase of crash recovery. 



Unique FilelDs - Are 64 bits enough? 



Conclusions 

The software that attempts to estimate on which disk volume a file resides should be written 
so it will work when different volumes are allocated different numbers of filelDs. The 
limitations of this software will probably determine whether the filelD allocation strategy 
can be changed easily or not. Since we do not know at this time how sophisticated the 
allocation of filelDs needs to be, this software should be written with enough generality to 

allow very sophisticated allocators. By the time we have allocated filelDs in blocks of 2^^ to 
100,000 processors, we will have used up less than 1% of our filelD space but we will have 
gained enough experience to be able to redesign the system of allocators (if necessary) so 
that fewer filelDs are wasted (allocated to allocators which then do not allocate them to 
actual files). 

Although I mentioned earlier that the percentage of storage used for filelDs is not very 
large, the current plans include storing a filelD in the header of every disk page. Thus the 
total amount of storage used for filelDs will be considerable even if it is only 3% of the 
total storage. We should make an attempt, then, to keep the size of the filelD small without 
placing the future utility of the filelDs at risk. I hope that I have shown that the filelD does 
not need to be larger than 64 bits. If the scheme seems to require mildly complex hardware, 
the savings in storage should cover any added hardware cost. 1 would like to point out that 
similar problems occur even if the filelD is unique over a single system. Most of the filelD 
is designed for the use of a single system. The existence of the high order bits in the filelD 
allows the number of filelDs that are normally allocated to a single system to be smaller 
than if we were guaranteeing uniqueness over a single system. We need not be concerned 

about the possibility of the 105^" processor that is delivered being kept by its owner for 200 
years. Thus removing the requirement of uniqueness of filelDs over all Xerox systems will 
not significantly reduce the costs of managing the allocation of filelDs or even significantly 
reduce the costs of storing filelDs. 
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